Goto

Collaborating Authors

 propagation step



VaiPhy: aVariationalInferenceBasedAlgorithmfor Phylogeny Appendix

Neural Information Processing Systems

Hence, during the training of VaiPhy, we used a maximum likelihood heuristic toupdate thebranch lengths givenatree topology. The branch lengths of the NJ tree are optimized with the same software. The optimized branch lengths are used as the initial set of lengths fore E(τ). In all of the figures, the left column is the current state ofτ, the middle column is two trees that are compared, and the right column is the selected tree. Solid lines indicate the edges inτ, and bold green lines are accepted edges (edges inM).




A Training Configurations

Neural Information Processing Systems

We summarize the data statistics in our experiments in Table 1. For both fully and semi-supervised node classification tasks on the citation networks, Cora, Citeseer and Pubmed, we train our DGC following the hyper-parameters in SGC [5]. Specifically, we train DGC for 100 epochs using Adam [2] with learning rate 0.2. For weight decay, as in SGC, we tune this hyperparameter on each dataset using hyperopt [1] for 10,000 trails. For the large-scale inductive learning task on the Reddit network, we also follow the protocols of SGC [5], where we use L-BFGS [3] optimizer for 2 epochs with no weight decay.



Depth-Aware Initialization for Stable and Efficient Neural Network Training

Pandey, Vijay

arXiv.org Artificial Intelligence

In past few years, various initialization schemes have been proposed. These schemes are glorot initialization, He initialization, initialization using orthogonal matrix, random walk method for initialization. Some of these methods stress on keeping unit variance of activation and gradient propagation through the network layer . Few of these methods are independent of the depth information while some methods has considered the total network depth for better initialization. In this paper, comprehensive study has been done where depth information of each layer as well as total network is incorporated for better initialization scheme. It has also been studied that for deeper networks theoretical assumption of unit variance throughout the network does not perform well. It requires the need to increase the variance of the network from first layer activation to last layer activation. W e proposed a novel way to increase the variance of the network in flexible manner, which incorporates the information of each layer depth. Experiments shows that proposed method performs better than the existing initialization scheme.


Supplementary Materials for Descent Steps of a Relation-A ware Energy Produce Heterogeneous Graph Neural Networks

Neural Information Processing Systems

X)vec (Y) (2) We now proceed with the proof of our result. Work completed during an internship at the A WS Shanghai AI Lab. Note that we apply Roth's column lemma to (11) to derive (12). GNN layers with 16 hidden dimensions. Table 1: Results using different base models (left) and test time comparisons (right).


V ai Phy: a Variational Inference Based Algorithm for Phylogeny Appendix A The V aiPhy Algorithm

Neural Information Processing Systems

The update equations of V aiPhy follow the standard mean-field VI updates. Furthermore, i is the set of nodes except node i, and C is a constant. We utilize the NJ algorithm to initialize V aiPhy with a reasonable state. An example script to run PhyML is shown below. Here we provide two algorithmic descriptions of SLANTIS.


Implicit vs Unfolded Graph Neural Networks

Yang, Yongyi, Liu, Tang, Wang, Yangkun, Huang, Zengfeng, Wipf, David

arXiv.org Artificial Intelligence

It has been observed that message-passing graph neural networks (GNN) sometimes struggle to maintain a healthy balance between the efficient/scalable modeling of long-range dependencies across nodes while avoiding unintended consequences such oversmoothed node representations, sensitivity to spurious edges, or inadequate model interpretability. To address these and other issues, two separate strategies have recently been proposed, namely implicit and unfolded GNNs (that we abbreviate to IGNN and UGNN respectively). The former treats node representations as the fixed points of a deep equilibrium model that can efficiently facilitate arbitrary implicit propagation across the graph with a fixed memory footprint. In contrast, the latter involves treating graph propagation as unfolded descent iterations as applied to some graph-regularized energy function. While motivated differently, in this paper we carefully quantify explicit situations where the solutions they produce are equivalent and others where their properties sharply diverge. This includes the analysis of convergence, representational capacity, and interpretability. In support of this analysis, we also provide empirical head-to-head comparisons across multiple synthetic and public real-world node classification benchmarks. These results indicate that while IGNN is substantially more memory-efficient, UGNN models support unique, integrated graph attention mechanisms and propagation rules that can achieve strong node classification accuracy across disparate regimes such as adversarially-perturbed graphs, graphs with heterophily, and graphs involving long-range dependencies.